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This 1s me 


e Long ago: PhD in CS, quant trading, credit scoring 
e Past: Search & personalization for ~7 years 
e Now: Hrempteyed Full-time open-source contributor 


Not [only] about search 


Not [only | about e-commerce 


Not [only] static 


Learn-to-rank, again? 


WE NEED TO TUNE FIELD WEIGHTS! 
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e A low-hanging fruit, existing tooling 
e poke - a/b test - poke - a/b test 
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e Iterative a/b tests take a lot of time 
e More weights = more problems 


JUST MULTIPLY BM25 WITH CTRI 


TOO RISKY! 


e Learn-to-rank needs a myriad of MLops things 
e Long project, no experience, no tooling = high risk 
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e BM25 * CTR = quick feedback 
s ITR = € 


e BERT, HNSWlib & FAISS are 2018 
e Fxisting tooling made it approachable 


LTR: a high risk investment 
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e team: ML/MLops experience 
e time: 6+ months, not guaranteed to succeed 
e tooling: custom, in-house 
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Are my ranking factors unique? 


me implementing 
CTR feature 
5th time in a year 
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e UA, Referer, GeoIP 
e query-field matching, item metadata 
e counters, CTR, visitor profile 


Is my data setup unique? 


e data model: clicks, impressions, metadata 

e feature engineering: compute and logging 

e feature store: Judgement lists, history replay, bootstrap 
e typical LTR ML models: LambdaMART 
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e cover 90% typical tasks in 10% time? 


Metarank 


a swiss army knife of re-ranking 


ecommerce 


ads content 


A secondary re-ranker 


search 


visitor 
| re-rank 


better ranking :) 


==) Metarank 


Inside Metarank 


Metarank 


Inside Metarank 


Training 


Open Source 
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Metarank: real time personalization as a service 


Docs | Website | Community Slack | Blog | Demo 
O Tests passing] License |Apache2 release v0.5.5 sj: Slack join the community 


What is Metarank? 


Metarank is a personalization service that can be easily integrated into existing systems and used to personalize 
different types of content. 


Like Instagram's personalized feed that is based on the posts that you've seen and liked, Facebook's new friends 
recommendation widget or Amazon's personalized results, you can add personalization to your application. You can 
combine different features, both user-based like location or gender and item-based like tags with different actions: 
clicks, likes, purchases to create a personalized experience for your users. 


Thanks to Metarank's simple API and YAML configuration, you don't need any prio machine learning experience to 
start improving your key metrics and run experiments. 
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listings, search results, recommendations 
that boosts user engagement. A friendly 
Learn-to-Rank engine 


@ metarank.ai 


search kubernetes data-science 


machine-learning scala deep-learning 
personalization data-engineering 
feature-extraction ranking 


feature-engineering 


Readme 
Apache-2.0 license 
Code of conduct 
1.5k stars 


13 watching 
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53 forks 


Releases 21 


O 0.5.5 (Latest) 


11 minutes ago 


+ 20 releases 


e Apache2 licensed, no strings attached 
e Single Jar file, can run locally 


Taking off 


1. Import historical events: S3, HTTP, files 
2. Train: LambdaMART @ XGBoost & LightGBM 
3. Inference: API, Redis as backend 


Data model 


Inspired by GCP Retail Events, Segment.io Ecom Spec: 


e Metadata: visitor/item specific info 
= item price, tags, visitor profile 

e Impression: visitor viewed an item list 
= search results, collection, rec widget 

e Interaction: visitor acted on an item from the list 
= Click, add-to-cart, mouse hover 


Document metadata example 


"event": "item", 
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("name": "title", "value": "Nice jeans"), 
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e Unique event 1d, item 1d and timestamp 
e Optional document fields 
e Partial updates are OK 


Ranking event example 
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e User & session fields 
e Which items were displayed, BM25 score 


Interaction event example 
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e Multiple interaction types: likes/clicks/purchases 
e Must include reference to a parent ranking event 


Demo: ranklens dataset 


Ne-eede YAML feature setup 


Goal: cover 90% most common ML features 


e feature extractors: compute ML feature value 
e feature store: add to changelog 1f changed 
e online serving: cache latest value for inference 


Feature extractors: basic 
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- name: budget 
type: number 


scope: item 
source: item.budget 
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Feature extractors: basic 


/f queo lssel emeose a Sig 
- name: genre 

type: string 

Scope: item 

Source: item.genre 


values: 
- comedy 
- drama 
= Bye e 


Special transformations 


// index encode mobile/desktop/tablet category 
// from User-Agent field 
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e There should be a User-A gent field present in ranking event 


Counters 


// count how many clicks were done on a product 
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e Uh-oh, there shouldn't be a global counter! 


More counters! 


// A sliding window count of interaction events 
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= Memes Item clic COMME 
type: window count 
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scope: item 

steile: sizes m // make a counter for each 24h rolling window 
windows: [7, 14, 30, 60] // on each refresh, aggregate to 1-2-4-8 week counts 
refresh: 1h 


Rates: CIR & Conversion 
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bottom: impression // to number of examine events 

scope: item 

bucket: 24h // aggregate over 24-hour buckets 
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e Rate normalization: 1 click + 2 impressions != CTR 50% 


Profiling 
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field: metadata.color 
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Per-field matching 
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itemField: item.title 
rankingField: ranking.query 
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e Lucene language-specific tokenization 1s supported 


Demo: ranklens config 


Demo: import and training the model 


What has just happened? 


event stream 
000000000000000000000000 


=. now 
history 


What has just happened? 


Peat ure valu 


event stream 
000000000000000000000000 


— now 
history 


What has just happened? 


feature 


event stream 
000000000000000000000000 


— now 
history 


What has Just happene d? 
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click-through join window 


Implicit judgements 


e Feed all of them into LambdaMART 


Demo: sending requests 


[not only | personalization 


e Demo: interacted with dynamic features > dynamic ranking 
e Pilot: static features = precomputed ranking 


[not only] reranking 
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better ranking :) 


e soon: recommendations retrieval (MF/BPR/ALS) 
e soon: merchandising 
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e Data collection: event schema, kafka/kinesis/pulsar connectors 
e Verification: validation heuristics 
e ML Code: LambdaMART now, more later 


e Feature extraction: manual & automatic f. engineering 


Cloud-native du design 
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| state update 


e ops: k8s stateless deployment, up/down scaling 
e mlops: ML model retraining, A/B testing 


Current status 


https://demo.metarank.a1 


e Not MVP: running in prod in pilot projects 
e k8s distributed mode, snowplow integration 
e A long backlog of ML tasks: click models, LTR, de-biasing 


We built Metarank to solve our problem. 


But it may be also useful for you 


e Looking for feedback: what should we do next? 
e Your unique use-case: what are we doing wrong? 


Metarank 
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e github.com/metarank/metarank 
e metarank.ai/slack 
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Metarank: real time personalization as a service 


Docs | Website | Community Slack | Blog | Demo 


release v0.4.1 sj: Slack join the community 


License Apache? last commit last monday 


What is personalization? 


Personalization is showing the same items but in different order for different users. 


The order of posts in FB, photos in Instagram, products in Amazon, and search results in Google is personalized for 
each visitor, as it directly affects user engagement: click rate and conversion. We've done 50+ a/b tests in different 
ecommerce verticals to confirm it. 


If you have items that are presented to a user in a specific order, you can personalize this order to improve your 
product's KPIs. 


Why Metarank? 


P 


A low code Machinê Learning service 
that personalizes articles, listings, search 
results, recommendations to boost user 
engagement. A friendly Learn-to-Rank 
engine 
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© 0.4.1 ( Latest) 


16 days ago 


+ 14 releases 


Questions 
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